Data Mining in Astronomical Databases

نویسنده

  • Kirk D. Borne
چکیده

A Virtual Observatory (VO) will enable transparent and efficient access, search, retrieval, and visualization of data across multiple data repositories, which are generally heterogeneous and distributed. Aspects of data mining that apply to a variety of science user scenarios with a VO are reviewed. 1 Science Requirements for Data Mining What is data mining and why is applicable to scientific research? Data mining is defined as an information extraction activity whose goal is to discover hidden facts contained in databases. Data mining has taken the business community by storm and there is consequently now a vast array of resources and research techniques available for exploitation by the scientific communities. It is useful therefore to examine a categorization of data mining thrusts and their sub-components, since these are likewise applicable to the scientific exploration of large astronomical databases. Data mining is used to find patterns and relationships in data by using sophisticated techniques to build models – abstract representations of reality. A good model is a useful guide to understanding that reality and to making decisions. There are two main types of data mining models: descriptive and predictive. Descriptive models describe patterns in data and are generally used to create meaningful subgroups or clusters. Predictive models are used to forecast explicit values, based upon patterns determined from known results. There is another differentiation of data mining into two categories that we find particularly appropriate to knowledge discovery in large astronomical databases: event-based mining and relationship-based mining. At the risk of trivializing some fairly sophisticated techniques, we classify event-based mining scenarios into four orthogonal categories: • Known events / known algorithms – use existing physical models (descriptive models) to locate known phenomena of interest either spatially or temporally within a large database. • Known events / unknown algorithms – use pattern recognition and clustering properties of data to discover new observational (in our case, astrophysical) relationships among known phenomena. • Unknown events / known algorithms – use expected physical relationships (predictive models) among observational parameters of astrophysical phenomena to predict the presence of previously unseen events within a large complex database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Science User Scenarios for a Virtual Observatory Design Reference Mission: Science Requirements for Data Mining

The knowledge discovery potential of the new large astronomical databases is vast. When these are used in conjunction with the rich legacy data archives, the opportunities for scientific discovery multiply rapidly. A Virtual Observatory (VO) framework will enable transparent and efficient access, search, retrieval, and visualization of data across multiple data repositories, which are generally...

متن کامل

Data Mining in Astronomical Databases

A Virtual Observatory (VO) will enable transparent and efficient access, search, retrieval, and visualization of data across multiple data repositories, which are generally heterogeneous and distributed. Aspects of data mining that apply to a variety of science user scenarios with a VO are reviewed. 1 Science Requirements for Data Mining What is data mining and why is applicable to scientific r...

متن کامل

Artificial intelligence tools for data mining in large astronomical databases

The federation of heterogeneous large astronomical databases foreseen in the framework of the AVO and NVO projects will pose unprecedented data mining and visualization problems which may find a rather natural and user friendly answer in artificial intelligence (A.I.) tools based on neural networks, fuzzy-C sets or genetic algorithms. We shortly describe some tools implemented by the AstroNeura...

متن کامل

Distributed Information Search and Retrieval for Astronomical Resource Discovery and Data Mining

Information search and retrieval has become by nature a distributed task. We look at tools and techniques which are of importance in this area. Current technological evolution can be summarized as the growing stability and cohesiveness of distributed architectures of searchable objects. The objects themselves are more often than not multimedia, including published articles or grey literature re...

متن کامل

Automated Clustering Algorithms for Classification of Astronomical Objects

Data mining is an important and challenging problem for the efficient analysis of large astronomical databases and will become even more important with the development of the Global Virtual Observatory. In this study, learning vector quantization (LVQ), single-layer perceptron (SLP) and support vector machines (SVM) were put forward for multi-wavelength data classification. A feature selection ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره astro-ph/0010583  شماره 

صفحات  -

تاریخ انتشار 2000